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Abstract — We  propose  the  application  of  independent  compo¬ 
nent  analysis  (ICA),  via  unsupervised  neural  networks,  to  au¬ 
thenticity  protection  for  multimedia  products.  We  give  an  over¬ 
view  of  the  current  state  of  multimedia  authenticity  protection, 
including  the  requirements  of  various  multimedia  applications, 
current  approaches  to  the  problem,  and  the  robustness  of  the 
approaches.  For  watermark  security,  a  covert  independent- 
component  watermarking  signal  can  serve  as  a  “vaccination” 
against  a  dormant  digital  “bacteria”  protecting  the  multimedia 
data.  An  unauthorized  removal  of  the  watermark  triggers  the 
bacteria  payload,  which  then  degrades  the  quality  of  the  unau¬ 
thorized  data.  We  argue  that  such  digital  bacteria  meet  all  the 
established  requirements  for  beneficial  virus-like  programs,  and 
their  payload  would  merely  affect  pirated  media.  We  show  how 
these  new  approaches  contribute  to  a  flexible,  robust,  and  secure 
system  for  protecting  the  authenticity  of  multimedia  products. 

Keywords — ^Multimedia  watermark,  copyright  protection, 
Internet  commerce,  independent  component  analysis,  unsuper¬ 
vised  neural  networks. 


1.  Introduction 

The  ubiquitous  piracy  of  copyrighted  digital  music  files 
over  the  Internet  threatens  the  livelihood  of  content -providing 
businesses  as  well  as  the  creative  artists  who  hold  the  copy¬ 
right.  Other  than  the  legal  and  practical  issues  related  to 
copyright  enforcement,  several  technology  factors  influence 
the  piracy.  The  major  technology  component  is  the  presence 
of  easy  to  use  Internet  based  peer-to-peer  file-sharing  tech¬ 
nology.  Another  factor  is  the  compact  format  and  high  fidel¬ 
ity  of  Moving  Picture  Expert  Group-I  (MPEG-I)  Layer-3 
(MP3)  digitally  encoded  music  files.  The  result  is  that  peer- 
to-peer  MP3  encoded  music  files  can  be  obtained  quickly 
over  high-speed  network  connections. 

The  application  of  Independent  Component  Analyses 
(ICA)  to  digital  watermarking,  for  the  purpose  of  multimedia 
authenticity  protection,  was  first  introduced  in  (Noel  and  Szu, 
2000).  This  has  foundations  in  previous  work  in  ICA  for 
intelligent  sensory  processing,  i.e.,  (Szu,  1999a),  (Szu, 
1999b),  and  (Quan,  Szu,  and  Markovitz,  2000). 

Many  other  technical  strategies  have  been  both  proposed 
and  pursued  to  reduce  the  piracy  problem.  One  approach  is 
to  embed  a  digital  code  or  watermark  in  the  music  itself  The 


watermark  can  be  chosen  to  be  imperceptible  to  the  listener 
or  deliberately  chosen  to  corrupt  the  musical  audio  signal. 

The  ideal  watermarking  technique  for  digital  music  files 
has  several  desirable  attributes  including  (1)  compatibility 
with  the  popular  MP3  format,  (2)  resistance  to  watermark 
removal  by  the  consumer,  and  (3)  complete  recovery  of  the 
music  from  the  watermark  with  authorized/restricted  com¬ 
puter  codes.  We  explore  the  efficacy  of  a  novel  approach 
(linear  mixing  and  blind  demixing  with  ICA)  for  watermark¬ 
ing  digital  music  files  to  fulfill  these  goals. 

Earlier  research  on  watermarking  via  ICA  concentrated  on 
either  detection,  (Szu,  1999a),  (Szu,  1999b),  (Hartung  and 
Kutter,  1999),  (Swanson,  Kobayashi,  and  Tewfick,  1998), 
and  (Jessop,  1999),  or  on  application  to  imagery,  (Noel  and 
Szu,  2000),  (Quan,  Szu,  and  Markovitz,  2000),  (Kopriva  and 
Szu,  2003),  (Barnett,  1999),  and  (Seok  and  Hong,  2001). 
Instead,  this  paper  focuses  on  some  of  the  subtle  and  practical 
issues  of  ICA  for  watermarking  digital  music  files,  especially 
robustness  with  respect  to  noisy  MP3  compression. 
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Fig.  1.  Standard  watermarking  model. 


1  Demix  1 

1  key  1 

i 

Web 

1  Demix 

Multimedia 

^  Mixed 

Advertisement 

Data 

Mix 

ICA 

Demix 

Media 

Player 

Visible  Mark 

Hidden  Mark 

i  Hidden  ^ 
mark  j 
key 


Multimedia 


Advertisement 


Hidden  Mark 
— ► 

(visible 
only  when 
proving 
copyright) 


Fig.  2.  ICA  demix  enabled  model. 
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Earlier  work  (Noel  and  Szu,  2000)  described  two  models 
for  watermarking  of  multimedia  products:  (1)  a  standard 
model  and  (2)  a  model  for  neural  net  ICA  demix  enabled  me¬ 
dia  player  (Fig.l  and  Fig.  2,  respectively).  The  model  in 
Fig.  1  uses  a  standard  media  player;  therefore,  the  hidden 
watermark  is  invisible.  On  the  other  hand,  the  model  in 
Fig.  2  uses  an  ICA  demixed  media  player  to  recover  the  hid¬ 
den  watermark  when  proving  copyright.  Previous  work 
(Noel  and  Szu,  2000)  points  out  a  critical  issue:  ICA  needs  to 
be  able  to  demix  signals  that  have  been  subjected  to  lossy 
compression.  It  also  demonstrates  that  ICA  neural  networks 
can  blindly  demix  media  signals  with  relatively  little  distor¬ 
tion  after  the  application  of  lossy  wavelet  compression. 

In  this  paper,  the  idea  is  to  nonlinearly  mix  a  watermark 
audio  signal  with  the  music  signal,  MP3  encode  and  decode 
the  resulting  signal,  then  attempt  to  demix  the  watermark 
signal  from  the  music.  The  novelty  of  this  method  introduces 
difficulty  in  the  removal  of  the  watermark  by  the  unsophisti¬ 
cated  consumer.  The  value  of  restricted  computer  codes  to 
restore  the  music  from  a  watermarked  signal  will  depend  in 
part  on  the  quality  of  music  after  blind  demixing  with  ICA 
(Yu,  Satter,  and  Ma,  2002),  (Yu  and  Satter,  2002),  (Gon- 
zalez-Serrano,  Molina-BuIIa,  and  Murillo-Fuentes,  2001), 
(Bell  and  Sejnowski,  1995),  (Bell  and  Sejnowski,  1996), 
(Amari,  Chichocki,  and  Yang,  1996).  We  measure  the  wa¬ 
termark  contamination  of  the  music  after  blind  demixing  with 
ICA  with  MP3  encoding. 

In  ICA,  sensed  signals  are  modeled  as  statistically  inde¬ 
pendent  components,  which  are  then  linearly  mixed.  Here 
independence  is  over  all  orders  of  statistics,  not  just  2"‘*-order 
correlations.  Neural  networks  with  unsupervised  learning 
rules  are  able  to  estimate  the  signal  mixing  to  high  accuracy, 
so  that  the  original  signals  (independent  components)  can  be 
recovered  through  inverse  mixing. 

The  ability  to  blindly  demix  signals  enables  a  novel  form 
of  security  against  those  who  may  attack  multimedia  water¬ 
marks.  One  of  the  independent  component  signals  in  the 
mixed  data  stream  could  be  a  copy  of  an  overt  (visible  or 
audible)  authenticity  mark  on  the  host  signal.  A  multimedia 
player  enabled  with  an  ICA  neural  network  could  then 
blindly  demix  the  overt  mark  copy  and  determine  whether  the 
mark  is  still  present  in  the  host  data. 

If  the  overt  authenticity  mark  has  been  removed,  then  a 
program  triggers  that  degrades  the  quality  of  the  unauthorized 
data.  We  call  this  program  an  electronic  “bacterium,”  as  op¬ 
posed  to  a  “virus,”  in  the  sense  that  it  does  not  replicate  in¬ 
discriminately.  Moreover,  there  exists  a  “vaccine”  against 
this  electronic  bacterium,  i.e.,  the  presence  of  the  authenticity 
mark.  We  also  investigate  two  key  requirements  of  the  ap¬ 
proach:  (1)  the  robustness  of  ICA  digital  watermarking  with 
respect  to  lossy  compression  and  (2)  the  dynamic  range  of  the 
digital  watermark  as  clutter  versus  original  music,  for  a  signal 
coded  under  the  MPEG-3. 

In  the  next  section,  we  review  ICA  theory  and  introduce 
the  concept  of  multimedia  watermarks  via  ICA.  Section  III 
then  applies  ICA  watermarks  within  a  full  framework  for 


protecting  multimedia  authenticity.  Sections  IV  through  VII 
then  perform  experiments  testing  the  robustness  of  the  ap¬ 
proach  for  MPEG-3  audio. 

II.  Blind  Demixing  of  Multimedia  Data  with  ica  Neu¬ 
ral  Networks 

Existing  watermarking  systems  are  essentially  non-blind, 
in  the  sense  that  they  require  original  versions  of  both  the 
host  data  and  watermark  keystream  in  order  to  extract  the 
watermark.  This  may  be  reasonable  for  applications  that 
merely  verify  the  existence  of  the  watermark  for  proving 
ownership.  However,  non-blind  schemes  are  inadequate  for 
our  application,  in  which  embedded  data  is  extracted  on  the 
consumer  side,  not  the  producer  side. 

A  straightforward  way  to  combine  host  and  watermark  sig¬ 
nals  is  to  combine  them  linearly.  The  signals  could  then  later 
be  demixed  given  the  linear  mixing  coefficients.  However, 
this  provides  little  in  the  way  of  flexibility  or  security.  It 
would  be  much  better  if  the  signals  could  be  demixed  without 
knowledge  of  the  original  mixing  coefficients,  or  even  of  the 
original  signals  themselves.  Thus,  we  are  faced  with  the 
problem  of  blind  demixing  of  source  signals. 

In  ICA,  the  source  signals  are  modeled  as  statistically  in¬ 
dependent  signal  components,  which  have  been  subsequently 
mixed  linearly.  Independence  is  defined  such  that  the  joint 
probability  densities  of  the  signal  components  can  be  factor¬ 
ized  as  the  product  of  the  marginal  densities.  Thus,  inde¬ 
pendence  is  over  all  orders  of  statistics,  not  just  2"‘*-order 
correlations. 

Neural  networks  with  unsupervised  learning  are  able  to  es¬ 
timate  the  signal  mixing  to  high  accuracy,  so  that  the  original 
signals  (independent  components)  can  be  recovered  through 
inverse  mixing.  The  learning  rules  are  based  on  maximizing 
the  degree  to  which  the  independent  components  are  non- 
gaussian.  Possible  measures  of  nongaussianity  are  the  abso¬ 
lute  value  of  kurtosis  (4*-order  cumulant),  or  negentropy 
(slightly  modified  version  of  differential  entropy). 

ICA  assumes  that  there  are  multiple  sensors,  each  sensing  a 
different  mix  of  the  independent  components.  Thus  is  ex¬ 
tends  conventional  single-sensor  processing  of  signals  to 
multiple  sensors,  analogous  to  the  multiple  sensors  (eyes  and 
ears)  in  humans.  The  unsupervised  ICA  neural  networks  are 
able  to  simultaneously  compare  sensor  outputs,  extracting 
noise  so  that  only  coherent  signals  remain.  In  the  unsuper¬ 
vised  learning  rule,  there  is  no  specific  desired  output  other 
than  white  gaussian  noise.  This  is  consistent  with  the  idea 
that  what  is  not  noise  must  be  signals. 

More  formally,  assume  n  mutually  independent  signal 
sources  at  each  time  instant,  represented  as  the  vector  5'  =  [s7(f), 
S2(t),...,  s„(t)]^,  which  are  then  linearly  mixed  with  an  un¬ 
known  mixing  matrix  A  onto  a  vector  of  received  signals 
x=[Xj(t),  X2(t),...,xjt)f,  i.e., 

x=As,  (1) 


where  A  is  an  m  x  n  scalar  matrix  of  full  rank.  Now  let  a  neu¬ 
ral  processor  with  weight  matrix  W  and  non-linear  contrast 
function  g  form  the  output  j: 

u  =  Wx,  (2) 

y  =  g(u),  (3) 

where  Vk  is  an  n  x  m  scalar  weight  matrix.  Further,  let  the 
following  additional  assumptions  hold; 

1 .  The  function  is  the  cumulative  distribution  func¬ 
tion  of  the  source  probability  density  function, 
p(S;(f)). 

2.  The  number  of  received  signals,  m,  is  at  least  equal 
to  the  number  of  source  signals,  n. 

3.  At  most  one  source  is  normally  distributed. 

4.  The  receiver  noise  is  negligible  relative  to  the  source 
signal  power. 

When  speed  of  media  processing  is  important,  we  recom¬ 
mend  pixel-parallel  Lagrange  Constraint  Neural  Networks 
(LCNN),  which  solves  blind  source  separation  pixel  by  pixel 
in  parallel.  It  has  been  shown  (Szu,  1999a),  (Szu,  1999b)  that 
minimization  of  Helmholtz  free  energy  H  =  E  —T^S  will 

achieve  blind  signal  source  separation,  where  E  is  the  La¬ 
grange  constraint  first-order  estimation  error 

E  =  /l([A]s  -  jc)  =  //([W]jc  -  s) . 

Here  T),  is  the  constant  temperature  of  thermal  reservoir,  and 
S  is  the  Shannon  Boltzmann  entropy.  Two  LCNN  theorems 
are  reviewed  as  follows,  the  first  one  without  data,  the  other 
with  data. 

Theorem  1.  An  equal  partition  law  at  Maximum  Entropy: 
Without  being  biased  by  pixel  data,  the  equilibrium  distribu¬ 
tion  at  the  maximum  of  entropy  is  an  equal  partition  law  as 
follows: 

s  =  Z*,  loss,  +K. (C  +  IX  Z*,'  - 1)  ■ 

7-1,M  7-1, M 

Next,  - =  -Kg  (logs + 1)  +  -M)  =  0  yields 

as, 

S j  =  ,  and  imposing  the  unit  sum  =  Iwe 

j=l,M 

1 

find  S  ■  = -  for  the  equal  partition  distribution.  Q.E.D. 

M 

By  computation  of  the  Amari  natural  gradient  with  respect 
to  the  weight  matrix  W,  and  the  distance  metric  [A]^[A],  we 
have 


Theorem  2.  ANN  Sigmoid  Distribution 
Setting  the  derivative  to  zero  for  the  minimum  solution 

dH 

— —  =  ^ ^k^kj  +  1  ~  (/^o  +  1)  =  0  reproduces, 

OSj  k 

by  the  unit  sum  constraints  =1,  the  ANN  sigmoid 


threshold  (without  assuming  it  for  ICA  post  processing): 
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The  Lagrange  constraint  vector  could  be  obtained  with  a 
standard  textbook  in  physics.  Thus,  we  achieved  single  pixel 
blind  source  separation,  i.e.,  the  output  would  not  have  the 
pixel  ensemble  average  led  to  a  permutation  and  scaled  ver¬ 
sion  of  the  original  mutually  independent  signal  sources  s. 

On  the  other  hand,  a  batch-mode  operation  uses  the  feed 
forward  weight  update  rule  (Bell  and  Sejnowski,  1995)  to 
have  the  general  form 


35  (j) 


=  (w^y'  + 


^  dp(u) ^ 
du 


p{u) 


V 


(5) 


J 


Here  piu)  is  a  source  probability  density  function.  As  (Amari, 
Chichocki,  and  Yang,  1996)  shows  by  post  multiplication  of 
(5)  by  one  has 


f  dp(u)^ 

dS(y)  _ 

1  + 

du 

T 

U 

dW 

piu) 

1  J 

(6) 


In  practice,  we  employ  fixed-point  Fast  ICA  Matlab  code 
(Oja  and  Karhunen,  1995)  to  compute  the  gradient.  Here  the 
hyperbolic  tangent  function  is  chosen  for  the  contrast  related 
function,  {d p(u)/  d  m)/ p(u),  in  (6). 

Fig.  3  is  an  example  of  the  mixing  of  multimedia  data,  with 
blind  demixing  by  an  ICA  unsupervised  neural  network.  The 
three  independent  data  components  are  the  host  data,  an  ad¬ 
vertisement,  and  an  authenticity  mark.  The  data  are  mixed  to 
insure  the  various  components  remain  together  for  Internet 
distribution.  The  neural  network  with  unsupervised  learning 
estimates  with  high  accuracy  the  mixing  matrix  A,  allowing 
blind  demixing  of  the  independent  components. 
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Fig.  3.  Example  mixing  of  multimedia  data,  with  blind  demixing  via  ICA 
unsupervised  neural  network. 

III.  Independent  Components  eor  Multimedia 
Watermark  Vaccination 


ence  compared  to  the  compressed  versions  with  no  mixing 
involved. 
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Fig. 5.  Robustness  of  ICA  neural  net  blind  demixing  with  respect  to  nonlinear 
wavelet  image  data  compression. 


It  is  important  that  ICA  demixing  be  robust  with  respect  to 
various  forms  of  signal  processing.  Being  linear,  ICA  demix¬ 
ing  is  invariant  with  respect  to  linear  filtering.  It  is  also  in¬ 
variant  with  respect  to  various  changes  in  sampling,  which 
cause  synchronization  problems  with  most  other  watermark¬ 
ing  techniques.  Such  sampling  changes  include  cropping, 
line  dropping,  or  changing  the  sample  rate.  In  the  ICA  model, 
each  sample  point  is  a  mix  of  two  independent  components, 
and  the  mix  is  same  for  all  samples. 

It  is  also  critical  that  ICA  neural  networks  be  able  to  demix 
signals  that  have  been  subjected  to  lossy  compression.  In 
general,  compression  can  occur  either  before  or  after  mixing. 
Compression  before  mixing  yields  nearly  lossless  demixing, 
and  avoids  costly  decompression  of  previously  compressed 
data.  As  we  see  in  Fig.  4  for  audio  data,  ICA  neural  networks 
can  blindly  demix  signals  with  minimal  distortion  even  after 
they  have  been  subjected  to  the  nonlinear  transformation  of 
lossy  wavelet  compression. 
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Fig. 4.  Robustness  of  ICA  neural  net  blind  demixing  with  respect  to  nonlinear 
wavelet  audio  data  compression. 


In  Fig.  5,  after  the  host  and  watermark  images  are  mixed, 
the  mixtures  are  subjected  to  DWT  compression.  ICA  unsu¬ 
pervised  neural  nets  are  still  able  to  blindly  demix  host  and 
mark  images,  despite  the  nonlinear  transformation  of  mixed 
images  via  compression.  In  fact,  there  is  little  visual  differ- 


Once  the  data  is  demixed,  it  is  open  to  attack  by  the  con¬ 
sumer.  For  example,  the  consumer  could  attempt  to  remove  a 
visible  overt  company  logo  from  the  demixed  multimedia 
product,  to  claim  ownership.  We  do  not  have  the  luxury  of  a 
DVD-like  approach,  where  there  is  industry-wide  cooperation 
and  all  multimedia  players  are  trusted.  Our  approach  needs 
to  be  completely  autonomous. 

Our  approach  to  making  product  authenticity  secure 
against  someone  who  has  the  demix  key  is  essentially  a  wa¬ 
termark  “vaccination”  against  digital  “bacteria.”  The  purpose 
of  the  bacteria  is  to  degrade  the  quality  of  the  multimedia 
product  if  it  has  been  tampered  with,  e.g.  its  company  logo  is 
removed. 

The  vaccination  is  the  actual  presence  of  the  company  logo, 
and  it  has  already  been  administered  in  the  sense  that  the  logo 
is  initially  part  of  the  product.  Hidden  data  localized  to  the 
company  logo  serves  as  the  flag  for  the  logo’s  presence.  The 
level  of  infection  (i.e.,  the  action  to  take  upon  logo  deletion) 
can  vary  from  doing  nothing,  to  degrading  sound/video  qual¬ 
ity,  to  rendering  the  multimedia  product  unusable,  depending 
on  the  choice  of  product  owner.  The  infection  is  localized  to 
the  unauthorized  multimedia  file,  and  spreads  to  no  other  part 
of  computer,  since  unauthorized  content  is  directly  recogniz¬ 
able  via  ICA  demixing. 

Fred  Cohen  pioneered  research  in  computer  viruses  ^ 
(Cohen,  1987).  His  early  results  showed  that  (1)  viruses 
could  spread  unhindered,  even  in  secured  networks,  (2)  that 
they  could  cause  essentially  unlimited  damage  with  little  ef¬ 
fort  by  the  virus  writer,  (3)  that  virus  detection  was  undecid- 


*  In  the  early  1980s,  as  a  Ph.D.  student  at  the  University  of 
Southern  California,  Fred  Cohen  got  the  idea  of  self- 
replicating  software  that  spreads  by  attaching  itself  to  existing 
programs.  He  shared  this  idea  with  his  thesis  advisor  Ten 
Adleman,  who  pointed  out  the  similarity  to  a  biological  virus, 
leading  to  the  term  “computer  virus.” 


able,  and  (4)  that  many  of  the  defenses  that  could  be  devised 
relatively  quickly  were  ineffective  against  a  serious  virus 
writer.  Despite  subsequent  advances  in  virus  defense,  it  ap¬ 
pears  that  it  will  always  be  possible  to  write  effective  viruses. 

For  various  reasons,  the  general  public  usually  considers 
computer  viruses  to  be  malicious  code  with  no  useful  purpose. 
However,  maliciousness  is  neither  a  necessary  nor  a  suffi¬ 
cient  property  for  computer  viruses.  Indeed,  Cohen  has  de¬ 
scribed  the  potential  advantages  of  beneficial  viruses. 

It  is  true  that  there  have  been  relatively  few  successful  ex¬ 
amples  of  beneficial  viruses,  despite  several  attempts. 
Bontchev  argues  convincingly  (Bontchev,  1994)  that  such 
failed  attempts  violate  one  or  more  important  properties  that 
beneficial  viruses  should  have. 

In  particular,  the  technical  properties  that  beneficial  viruses 
should  have  are  (1)  control,  (2)  recognition,  (3),  no  resource 
wasting,  (4)  bug  containment,  (5)  compatibility,  and  (6)  ef¬ 
fectiveness.  Such  viruses  should  also  meet  ethical/legal  re¬ 
quirements  with  respect  to  unauthorized  data  modification 
and  copyrights,  as  well  as  psychological  barriers  involving 
lack  of  trust  and  negative  connotations  with  computer  viruses. 

Bontchev  also  describes  a  general  model  for  beneficial  vi¬ 
ruses  that  satisfies  these  requirements  (Bontchev,  1994).  This 
model  requires  the  active  consent  of  individual  users,  via 
cryptographically  strong  system  authentication.  It  also  re¬ 
quires  that  the  virus  be  self-contained  and  propagate  as  a 
whole,  and  not  depend  on  attaching  itself  to  a  host  executable 
file.  The  beneficial  virus  must  also  consume  negligible  re¬ 
sources,  at  least  compared  to  the  benefits  it  provides. 

Our  proposed  digital  bacteria  fit  within  this  general  model 
for  beneficial  viruses.  They  fulfill  the  control  requirement 
because  they  only  spread  to  machines  that  have  agreed  to  the 
network  policy  with  respect  to  unauthorized  media.  They 
fulfill  the  recognition  property  because  they  can  be  easily 
ignored  by  anti-virus  scanners,  since  they  can  authenticate 
themselves  as  a  known  beneficial  virus. 

Our  digital  bacteria  meet  the  resource-wasting  criterion, 
since  only  a  single  instance  of  the  virus  is  present  on  an  in¬ 
fected  machine,  and  assuming  the  media  degradation  process 
is  efficient.  Through  public -key  cryptography  for  authentica¬ 
tion,  digital  bacteria  updates  can  be  strictly  controlled,  ad¬ 
dressing  the  bug-containment  problem. 

Because  the  digital  bacteria  are  self-contained  and  do  not 
modify  other  programs,  they  introduce  no  greater  risk  on  in¬ 
compatibility  than  non-viral  programs.  Our  digital  bacteria 
meet  the  effectiveness  requirement,  since  a  virus-like  propa¬ 
gation  is  a  highly  effective  way  to  ensure  multimedia  authen¬ 
ticity  protection  across  a  participating  network. 

Since  our  proposed  digital  bacteria  modify  only  unauthor¬ 
ized  media  files,  there  are  no  ethical/legal  problems.  The 
psychological  barrier  is  largely  overcome  by  the  fact  that 
most  people  would  not  consider  such  programs  to  be  actual 
viruses.  Some  users  may  resent  the  effects  of  the  digital  bac¬ 
teria  if  they  thwart  their  efforts  at  illegally  copying  music  or 
videos.  But  network  administrators  would  appreciate  that 
illegal  activities  are  being  curtailed  on  their  network,  and 


there  would  be  definite  monetary  incentives  by  the  media 
content  producers  and  artists. 

One  way  to  implement  watermark  vaccination  is  to  have 
the  vaccination-checking  executable  code  downloaded  along 
with  the  mixed  data  and  multimedia  player.  It  is  then  exe¬ 
cuted  along  with  the  player,  as  shown  in  Fig.  8.  Multimedia 
data,  advertisement,  and  mark  data  are  mixed  and  secured, 
then  embedded  within  the  object.  The  neural  network  ICA 
demix  media  player  is  defined  as  an  object  method,  with 
read-only  access  to  mixed  data.  The  demix  key  allows  data 
to  be  demixed,  and  the  mark  key  allows  the  hidden  mark  to 
be  accessed  for  copyright  proof. 


Fig.  8.  Object-based  model  for  multimedia  product  protection  by  watermark 
vaccination. 


Another  implementation  is  DVD-like,  that  is,  via  trusted 
players.  Either  the  multimedia  producers  or  a  third  party 
provides  the  player/protection  software  cost-free.  This  ap¬ 
proach  requires  universal  participation  for  complete  protec¬ 
tion;  otherwise,  there  is  protection  only  among  trusted  players. 
A  third  approach  is  to  implement  a  DVD-like  scheme  in 
hardware,  with  an  interface  to  the  operating  system. 

IV.  Experimental  Background 

A.  Blind  Demixing  with  ICA 

The  idea  is  to  linearly  mix  a  watermark  acoustic  signal 
with  the  music  signal,  MP3  encode  and  decode  the  resulting 
signal,  then  attempt  to  demix  the  watermark  signal  from  the 
music.  The  novelty  of  this  method  introduces  difficulty  in 
the  removal  of  the  watermark  by  the  unsophisticated  con¬ 
sumer.  The  value  of  restricted  computer  codes  to  restore  the 
music  from  a  watermarked  signal  will  depend  in  part  on  the 
quality  of  music  after  blind  demixing  with  ICA.  We  measure 
the  watermark  contamination  of  the  music  after  blind  demix¬ 
ing  with  ICA  with  MP3  encoding.  The  receiver  noise  is  neg¬ 
ligible  relative  to  the  source  signal  power. 

B.  MP3  Encoding/Decoding 

The  compact  MP3  encoding  format  is  a  lossy  compression 
method.  It  utilizes  a  psycho-acoustic  model  of  human  hear¬ 
ing  to  discard  information  (Pan,  1995),  (Brandenburg  and 
Popp,  2000).  As  shown  in  Pig.  9,  human  hearing  is  subject  to 


auditory  masking,  where  small  tones  in  the  local  frequency 
neighborhood  of  a  much  stronger  tone  are  imperceptible. 


Frequency 


Fig.  9.  Illustration  of  audio  noise  masking.  Weak  signals  in  the  local  fre¬ 
quency  neighborhood  of  a  strong  tone  are  imperceptible. 

Further,  the  human  auditory  system’s  frequency  resolution 
is  a  function  of  absolute  frequency;  perceptible  filter  widths 
are  narrow  at  the  low  end  and  significantly  wider  at  the  high 
end.  Thus,  weak  tones  adjacent  to  stronger  tones  may  be 
eliminated  by  MP3  encoding/decoding. 

Several  issues  arise  when  considering  ICA  for  blind 
demixing  of  MP3  watermarked  music.  First,  introduction  of 
a  watermark  signal  must  be  considered  in  the  context  of  the 
music,  so  that  it  will  not  be  eliminated  by  the  lossy  MP3  en¬ 
coding/decoding.  A  second  issue  is  consideration  of  the  sta¬ 
tistics  of  the  music  and  watermark  signals  after  MP3  encod¬ 
ing/decoding.  The  concern  is  that  the  statistics  of  both  the 
watermark  and  music  signals  may  be  perturbed  when  selected 
frequency  components  are  nulled.  The  altered  statistics  may 
adversely  affect  the  ICA  blind  demixing. 

Our  strategy  is  to  measure  the  watermark  contamination  in 
the  music  signal  after  blind  demixing  with  ICA.  First  we 
measure  the  watermark  residue  in  the  music  channel  without 
MP3  encoding/decoding.  Then  the  measurement  is  repeated 
after  MP3  encoding/decoding.  The  next  section  discusses  the 
methodology  and  equipment  configuration  for  our  measure¬ 
ments. 

V.  Methodology 

The  goal  is  to  measure  the  watermark  contamination  in 
the  music  signal  after  blind  demixing  with  ICA.  First  a 
baseline  is  produced  that  measures  the  watermark  residue  in 
the  music  channel  when  the  MP3  codec  is  not  employed,  in 
Fig.  10  without  the  dashed  block.  Then  the  residue  level  is 
measured  by  introducing  the  MP3  codec  to  the  linearly 
mixed  signals  as  in  Fig.  10  with  the  dashed  block.  To  con¬ 
trol  the  relative  contributions  of  the  watermark  and  music 
signals,  the  linear  mixing  matrix  was  parameterized  for  a 
single  mixture  angle,  0.  In  both  cases,  the  latter  are 
demixed  using  ICA  to  recover  permuted  and  scaled  versions 
of  the  originals  (Music’  and  Watermark’). 

To  facilitate  simultaneous  linearity  and  residue  measure¬ 
ments,  the  watermark  signal  and  music  signals  were  imple¬ 
mented  as  a  pair  of  dual  tones.  The  Watermark  signal  is  two 
equal  amplitude  tones  at  900  Hz  and  1000  Hz  and  two  equal 
amplitude  tones  for  the  Music  signal  are  at  2100  Hz  and 
2250  Hz.  The  linear  dynamic  range  is  determined  by  meas¬ 


uring  the  magnitude  of  the  intermodulation  (IM)  products 
generated  by  the  tone  pair,  relative  to  the  magnitude  of  the 
respective  tone  pairs.  As  an  example,  for  the  two  tone  wa¬ 
termark  signal  at  900  and  1000  Hz,  any  T'’*  order  IM  prod¬ 
ucts  would  occur  at  800  and  1 100  Hz. 


Fig.  10.  Mixing  and  demixing  procedure  model  with/without  MP3  codec 
block  (dashed  block). 

The  MP3  codec  was  implemented  using  commercial  soft¬ 
ware  (Syntrillium,  2000).  The  following  MP3  encoding  op¬ 
tions  were  chosen  and  held  constant  through  all  the  experi¬ 
ments:  44.1  kHz  sample  rate,  128  kbps  per  channel,  and  ste¬ 
reo  (no  joint  stereo  coding). 

As  a  check,  the  software  MP3  decoder  was  checked  with 
consumer  hardware.  MP3  files  were  burned  on  CDROMs, 
played  back  on  a  Riovolt  MP3/CD  Player  through  an  RCA 
SA-155  Integrated  Stereo  Amplifier  (Radio  Shack  catalog 
#31-5000)  with  spectrum  measurements  obtained  from  a 
spectrum  analyzer  (HP  3588A  with  opt.  001).  The  nominal 
80  dB  of  linear  dynamic  range  measured  in  the  MP3  software 
codec  was  also  observed  with  the  hardware. 

VI.  Results 

An  illustration  of  the  results  of  demixing  for  a  mixture  an¬ 
gle  of  30  degrees  in  the  absence  of  the  MP3  codec  is  shown 
in  Fig.  11.  It  shows  more  than  80  dB  of  isolation  between  the 
wanted  signals  and  the  residue  signals.  The  left  column  is 
Watermark’  channel  with  the  wanted  watermark  tones  at  900 
Hz  and  1000  Hz,  and  the  right  column  is  Music’  channel  with 
the  wanted  music  tones  at  2100  Hz  and  2250  Hz. 

Fig.  12  is  the  summary  of  minimum  magnitude  differences 
in  dB  for  the  case  when  the  MP3  codec  (the  dashed  block  in 
Fig.  10)  was  absent  and  the  mixture  angle  was  varied  from  0 
to  90  degrees  at  0.1 -degree  increments  in  the  top  of  pictures. 

In  the  bottom  of  the  pictures  in  Fig.  12  is  a  summary  for 
the  case  when  the  MP3  codec  (the  dashed  block  in  Fig.  10) 
was  present,  but  the  residues  were  evaluated  only  from  0  de¬ 
gree  to  90  degree  at  every  1  degree  increments.  In  this  figure, 
the  left  column  shows  the  minimal  magnitude  differences  in 
dB  between  the  residue  signal  and  the  wanted  signal  in  the 
Watermark’  channel,  and  in  the  same  way  the  right  column 
shows  the  minimal  magnitude  differences  in  Music’  channel. 


Watermark’  Channel 


Music’  Channel 


The  lossy  MP3  coding  format  might  affect  the  statistics  of 
both  the  watermark  and  music  signals  when  selected  fre- 
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Fig.  11.  The  residues  in  both  demixed  channels. 
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Fig.  12.  Minimum  magnitude  differences  (dB)  between  the  residue  signal 
and  the  wanted  signal  are  shown  at  each  mixture  angle  from  0  to  90  degrees. 

VII.  Discussion 

Without  the  MP3  codec,  the  results  of  blind  demixing  with 
ICA  showed  watermark  attenuation  in  the  music  channel  was 
more  than  82  dB  for  a  wide  range  of  mixture  angles,  close  to 
the  90  dB  dynamic  range  of  MP3  encoded  music.  When  the 
linearly  mixed  signals  were  passed  through  the  lossy  MP3 
codec,  the  residues  were  approximately  70  dB  down  from  the 
wanted  signals  as  measured  at  the  30  and  60  degrees  mixture 
angles.  However,  when  we  measured  from  0  to  90  degrees  at 
every  1  degree,  the  residues  were  approximately  larger  than 
59  dB  down.  Despite  the  23  dB  performance  deficiency  with 
the  MP3  codec,  the  watermark  contamination  in  the  music 
channel  is  likely  to  be  imperceptible  by  the  human  auditory 
system.  The  high  quality  of  the  music  recovery  suggests  that 
there  is  value  in  restricted  computer  codes  for  the  restoration 
of  music  from  watermarked  MP3  digital  music  files. 


quency  components  are  nulled.  However,  in  our  evaluations 
with  the  selected  simple  tonal  pairs  representing  the  water¬ 
mark  and  acoustic  signals,  the  adverse  affects  on  blind 
demixing  with  ICA  are  small. 

Some  open  research  opportunities  include  recovery  of  the 
absolute  amplitude  and  phase  of  the  music  signal.  With  ICA 
blind  demixing  an  unknown  scale  and  permutation  in  the  re¬ 
covered  outputs  generally  occurs.  A  scale  change  could  also 
manifest  itself  as  a  180-degree  phase  shift  in  the  music  chan¬ 
nel.  For  stereo  music,  this  may  destroy  the  “depth”  percep¬ 
tion.  If  some  of  the  properties  of  the  watermark  are  known  a 
priori,  then  recovery  of  the  watermark  might  serve  as  a  guide 
to  recover  the  music  channel’s  absolute  amplitude  and  sense 
of  phase. 

VIII.  Summary  and  Conclusions 

This  paper  examines  the  application  of  independent  com¬ 
ponent  analysis  (ICA),  via  unsupervised  neural  networks,  to 
authenticity  protection  for  multimedia  products.  The  blind 
demixing  capability  of  these  neural  networks  extends  signal 
processing  from  a  one-sensor  approach  to  a  multi-sensor  ap¬ 
proach. 

For  watermark  security,  we  propose  a  covert  watermarking 
signal  that  serves  as  a  vaccination  against  a  dormant  digital 
bacterium.  Removal  of  the  watermark  triggers  the  bacterium, 
which  degrades  the  quality  of  the  product  being  pirated.  We 
show  that  our  digital-bacteria  model  meets  established  tech¬ 
nical  and  ethical  requirements  for  beneficial  virus-like  pro¬ 
grams. 

We  apply  our  novel  ICA  watermarking  method  to  digitally 
encoded  acoustic  music  files.  A  watermarked  signal  was 
linearly  mixed  with  a  music  signal.  The  resulting  mixture 
pair  was  put  through  an  MP3  codec,  and  then  blindly 
demixed  with  ICA  to  recover  the  music  signal.  The  water¬ 
mark  contamination  in  the  music  channel  was  measured  both 
when  the  MP3  codec  was  absent  and  when  it  was  present. 

Our  experimental  results  show  that  watermark  contamina¬ 
tion  in  the  music  channel  is  likely  to  be  inaudible,  and  that 
the  adverse  affects  on  blind  demixing  with  ICA  are  small. 
We  conclude  that  our  approach  can  provide  a  flexible,  robust, 
and  secure  system  for  protecting  the  authenticity  of  multime¬ 
dia  products. 
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